VACE = Video(-aware) And Content Editing: a unified editing/generation framework that brings together text, images, video, masks, and various control signals. It uses a single input format to perform multi-task video generation and editing
Wan2.1 VACE (based on the Wan2.1 generator)
Goal: unify multi-modal conditions for video editing/generation (T2V, I2V, V2V, local editing, etc.), emphasizing a “one model for many tasks” interface.
Typical usage: provide a reference image (to preserve identity/appearance) plus a driving video or its parsed control signals (e.g., pose sequence, trajectory, time-varying depth/edges) to produce a video driven by that reference image. The VACE/Fun family offers these temporal control interfaces and unified task support.